186 ◾ Bioinformatics
5.3.7.5 Normalization
After filtering the low-expressed genes, we can normalize the count data. EdgeR uses TMM
to compute normalization factor that corrects sample-specific biases. Without normaliza-
tion, if only few genes have high expression, those genes will account for a substantial
proportion of the library size for a specific sample, causing other genes to be under-rep-
resented. The normalization factor is multiplied by the library size to yield the effective
library size, which is used for normalization. The following function calculates the TMM
normalization factor:
yNorm <- calcNormFactors(y)
Notice that as shown in Figure 5.11, the normalization factor was changed for each sample.
5.3.7.6 Estimating Dispersions
The next step is to use the above normalized count data to estimate the dispersions which
will be used to estimate the parameters of the negative binomial model as discussed above.
As there are only few replicates or samples, estimation of the gene-wise dispersions based
on the count vector of the gene across replicates will not be accurate. EdgeR uses informa-
tion sharing between genes to estimate dispersion; genes of closely similar abundance will
FIGURE 5.10 DGEList object after filtering out genes with low gene expression.